{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Run PLMSearch locally 🧪\n",
    "\n",
    "**Notice:**\n",
    "\n",
    "**The experiment are implement on a server with a `56-core Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40 GHz and 256 GB RAM`.**\n",
    "\n",
    "**The GPU environment of the server is `1 × GeForce GTX 1080 Ti and 11 GB GPU Memory`.**\n",
    "\n",
    "## Quick links\n",
    "* [Start from Fasta (preprocessing)](#1)\n",
    "  * [Generate ESM-1b embedding](#1-1)\n",
    "  * [Generate Pfam result](#1-2)\n",
    "* [SS-predictor pipeline](#2)\n",
    "* [PLMSearch pipeline](#3)\n",
    "* [Alignment (Sequence align & TM-align)](#4)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Start from Fasta (preprocessing)\n",
    "<span id=\"1\"></span>"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1. Generate ESM-1b embedding\n",
    "<span id=\"1-1\"></span>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Transferred model to GPU\n",
      "Read ./example/protein.fasta with 5 sequences\n",
      "Processing 1 of 1 batches (5 sequences)\n",
      "Embedding generation time cost: 27.4346022605896 s\n"
     ]
    }
   ],
   "source": [
    "#esm generate\n",
    "!python ./plmsearch/embedding_generate.py \\\n",
    "-f './example/protein.fasta' \\\n",
    "-e './example/embedding.pkl' #--nogpu #for CPU-ONLY"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2. Generate Pfam result\n",
    "PLMSearch requires this input, while SS-predictor `does not`.\n",
    "\n",
    "Important Note: Due to the third-party software Pfamscan, the original ProteinID `should not` contain `Spaces ' '`.\n",
    "<span id=\"1-2\"></span>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1715173337.4911733\n",
      "perl ./plmsearch_data/PfamScan/pfam_scan.pl -fasta ./example/protein.fasta -dir ./plmsearch_data/Pfam_db -outfile ./example/tmp.txt\n",
      "Pfam local generate time cost 2.730412721633911 s\n"
     ]
    }
   ],
   "source": [
    "#pfam generate\n",
    "!python ./plmsearch/pfam_generate.py \\\n",
    "-f './example/protein.fasta' \\\n",
    "-o './example/pfam_result.json'"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## SS-predictor pipeline\n",
    "<span id=\"2\"></span>\n",
    "<div align=center><img src=\"scientist_figures/workflow_img/ss_predictor3.png\" width=\"90%\" height=\"90%\" /></div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Set Swiss-Prot as target dataset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Embedding load time cost: 42.21443510055542 s\n",
      "We have 4 GPUs in total!, we will use as you selected\n",
      "Search query proteins batch by batch: 100%|███████| 1/1 [00:08<00:00,  8.24s/it]\n",
      "Search time cost: 9.784892082214355 s\n"
     ]
    }
   ],
   "source": [
    "!python ./plmsearch/main_similarity.py \\\n",
    "-iqe './example/embedding.pkl' \\\n",
    "-ite './plmsearch_data/swissprot/embedding.pkl' \\\n",
    "-smp './plmsearch_data/model/plmsearch.sav' #-d #for CPU-ONLY"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## PLMSearch pipeline\n",
    "<span id=\"3\"></span>\n",
    "<div align=center><img src=\"scientist_figures/workflow_img/framework1.png\" width=\"90%\" height=\"90%\" /></div>"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Set Swiss-Prot as target dataset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\u001b[32m[I 231212 23:36:38 main_pfam:8]\u001b[39m query protein num = 5\n",
      "\u001b[32m[I 231212 23:36:38 main_pfam:9]\u001b[39m target protein num = 430140\n",
      "query protein list: 100%|█████████████████████████| 5/5 [00:00<00:00,  6.23it/s]\n"
     ]
    }
   ],
   "source": [
    "#Step 1. generate pfamclan prefilter result\n",
    "!python ./plmsearch/main_pfam.py \\\n",
    "-qpr './example/pfam_result.json' \\\n",
    "-tpr './plmsearch_data/swissprot/pfam_result.json' \\\n",
    "-c"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Embedding load time cost: 40.31576943397522 s\n",
      "We have 4 GPUs in total!, we will use as you selected\n",
      "Get search list: 17365it [00:00, 194832.11it/s]\n",
      "\u001b[32m[I 231212 23:37:23 main_similarity:156]\u001b[39m presearch num = 17365\n",
      "Search query proteins batch by batch: 100%|███████| 1/1 [00:03<00:00,  3.86s/it]\n",
      "Search time cost: 5.484592914581299 s\n"
     ]
    }
   ],
   "source": [
    "#Step 2. PLMSearch search\n",
    "!python ./plmsearch/main_similarity.py \\\n",
    "-iqe './example/embedding.pkl' \\\n",
    "-ite './plmsearch_data/swissprot/embedding.pkl' \\\n",
    "-smp './plmsearch_data/model/plmsearch.sav' \\\n",
    "-isr './example/search_result/pfamclan' #-d #for CPU-ONLY"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Alignment (Sequence align & TM-align)\n",
    "<span id=\"4\"></span>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "pairwise sequence align: 100%|█████████████████| 6/6 [00:00<00:00, 45507.82it/s]\n",
      "sequence align output:   0%|                              | 0/6 [00:00<?, ?it/s]\n",
      "P0AD96\tP0AD96\t1.0\n",
      ">P0AD96\tP0AD96\n",
      "MNIKGKALLAGCIALAFSNMALAEDIKVAVVGAMSGPVAQYGDQEFTGAEQAVADINAKGGIKGNKLQIVKYDDACDPKQAVAVANKVVNDGIKYVIGHLCSSSTQPASDIYEDEGILMITPAATAPELTARGYQLILRTTGLDSDQGPTAAKYILEKVKPQRIAIVHDKQQYGEGLARAVQDGLKKGNANVVFFDGITAGEKDFSTLVARLKKENIDFVYYGGYHPEMGQILRQARAAGLKTQFMGPEGVANVSLSNIAGESAEGLLVTKPKNYDQVPANKPIVDAIKAKKQDPSGAFVWTTYAALQSLQAGLNQSDDPAEIAKYLKANSVDTVMGPLTWDEKGDLKGFEFGVFDWHANGTATDAK\n",
      "|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||\n",
      "MNIKGKALLAGCIALAFSNMALAEDIKVAVVGAMSGPVAQYGDQEFTGAEQAVADINAKGGIKGNKLQIVKYDDACDPKQAVAVANKVVNDGIKYVIGHLCSSSTQPASDIYEDEGILMITPAATAPELTARGYQLILRTTGLDSDQGPTAAKYILEKVKPQRIAIVHDKQQYGEGLARAVQDGLKKGNANVVFFDGITAGEKDFSTLVARLKKENIDFVYYGGYHPEMGQILRQARAAGLKTQFMGPEGVANVSLSNIAGESAEGLLVTKPKNYDQVPANKPIVDAIKAKKQDPSGAFVWTTYAALQSLQAGLNQSDDPAEIAKYLKANSVDTVMGPLTWDEKGDLKGFEFGVFDWHANGTATDAK\n",
      "  Score=367\n",
      "\n",
      "\n",
      "P0AD96\tP15104\t0.2251655629139073\n",
      ">P0AD96\tP15104\n",
      "MNIKGKALLAGCIALAF--SNMA---LAED---IKVAV-VGA-MSG-PVAQYGDQEFTGAEQA-V-AD--INAK---G-GIK-GNKLQIVKYDDA-C-----D--PKQAVAVANKV-V------N-DGIKYVIGHLCSSST-QPASDIYEDEGIL----MIT---PAATA----PELTA-RGYQ-----LI-L-------R----T----T---------------G------L---D-------SD------QGPTAAKYILEK---VKPQRI-AIVHDKQQ--YG----EGLA--RAVQDG-LK----K--G-NANV-VF------FDG-ITAG--EKDF--ST----L-VAR--LKK----ENIDFVYYGGYHPEMGQ-ILRQARAAGLKTQFMG-PE----GVANVSLS-NI-AGE-----S--A--E--GLLVTKPKNYDQVPANKPIVD--AIKA-K--K--Q------DPS-GAFVWTTYAALQSLQAGLNQSDDP-A-------E---I----AKYLK--AN-SVD-------TVMGPLTWD-EKGDLKG-FEFGVFDWH----ANG------T-A---T-------DA-----K-\n",
      "|                  |  |   |      ||    |   ||  |      |   | |   | |   |      | |   |  |         |     |  |        |  |      | |      |    ||| |  |     ||      |     | | |    |     |        |  |       |    |    |               |      |   |       |       |||    |       |      |   |     ||    |  |  ||     |     |  | ||   |       |   |  |  |     |     | |||  |      |  ||          |  |      |   | |   |     |  |     |  ||      |  |  |  | |    | |        |    ||   |  |  |      ||  |               ||    |  |       |   |    |      || |         || |      ||   || ||    |      ||       | |   |       |      | \n",
      "M----------------TTS--ASSHL---NKGIK---QV--YMS-LP------Q---G-E--KVQA-MYI---WIDGTG--EG--L--------RCKTRTLDSEP--------K-CVEELPEWNFD------G----SSTLQ--S-----EG--SNSDM--YLVP-A-AMFRDP----FR---KDPNKL-VLCEVFKYNRRPAETNLRHTCKRIMDMVSNQHPWFGMEQEYTLMGTDGHPFGWPS-NGFPGPQGP----Y----YCGV-----GA---D---RAYGRDIVE--AHYRA----CL-YAGVKIAGTNA--EV-MPAQWEF--QI--GPCE---GIS-MGDHLWVARFIL--HRVCE--DF----------G-VI------A---T-F--DP-KPIPG--N----WN-GAG-CHTNFSTKAMREENG-L----K-Y--------I--EEAI--EKLSKRHQYHIRAYDP-KG---------------GL----D-NARRLTGFHETSNINDFSA----GVANRS--ASIRIPRTV-G-----QEK---KGYFE----D--RRPSAN-CDPFSVTEALIRTCLLNETGD-EPFQYKN\n",
      "  Score=136\n",
      "\n",
      "\n",
      "P0AD96\tP22768\t0.24050632911392406\n",
      ">P0AD96\tP22768\n",
      "MNIKGKALLAGCIALAFSNMA--LAEDIK--V--A--------VVGA-MSGPVAQY--GDQEFTGAEQ---AVADIN---A-KGGIKGN--KLQI-VKYD--DAC--DP--KQ-----AVA-VANKV-VN--D----G-------I-KYV--I------GHLCSSSTQP-A-S--------D-IYED--EGILM-----------ITPAATA---PELTA---R--GYQLILRTTGL-DS--D---Q-G-PT-A---AK------------Y---ILE-------KV---KPQRIA-IVHDK------Q-QYG------E-GLARA-VQDG-L-----K--KGNAN-V-VFFDGITAGEK--D-F---ST-LVARLKKE-N----IDF-VYYGGYHPEMGQ---I-L--R----QARAAG-LKTQFMGPEGV---AN-VS-LSNIAGESAEGLLVTKPKNY-DQVPANKPI-VDAIKAKK-Q--DPSGAFVWTT--YAALQSLQAG-L--NQ-SDD---PAEIAK--YLKA---------NSVD--TVM------GP----L---TWD--EKG--D--------LK-GFEF------GVFDWH-ANGT--A-----------TDAK------\n",
      "| .|||    .|  ||.|     |  |    |  |        || | |    |    | ||    |    | |      | |  | |   |    |     | |  |   |      ||  | |   |   |    |       | |    |      |  |       | |        | |     |  |            |||       ||      |  |     |     |   |   | | |  |   ||            |   |||       |    |      || |       | |        | ||    |    |     |  |     | |     |   |  | |   |  | ||     |    ||  |       |      | |  |    |   |  | |       |   |  |  |        || | |      |     |   |        |  | |  ||  |  |    |     |  |  |     | |     |            |||   ||       |     |   |    ||   |        |  |  |      | |    |     |           |  |      \n",
      "M-SKGK----VC--LAYS---GGL--D--TSVILAWLLDQGYEVV-AFM----A--NVG-QE----E-DFDA-A---KEKALK--I-G-ACK---FV---CVD-CRED-FVK-DILFPAV-QV-N--AV-YEDVYLLGTSLARPVIAK--AQIDVAKQEG--C------FAVSHGCTGKGNDQI---RFE--L-SFYALKPDVKCITP----WRMPE---FFERFAG-----R----KD-LLDYAAQKGIP-VAQTKAKPWSTDENQAHISYEAGILEDPDTTPPK-DMWK-----LIV-D-PMDAPDQPQ--DLTIDFERGL---PV---KLTYTDNKTSK----EVSV-----T---KPLDVFLAAS-NL-AR----ANGVGRID-IV-------E---DRYINLKSRGCYEQ---A-PL-T-------VLRKA-HV-DL--------EG-L-T-----LD-----K--EV-------RQLRD-S--FV--TPNY----S----RLIYN-GS--YFTP-E---CEY---IRSMIQPSQNSV-NGTV-RVRLYKG-NVIILGRST--KTEK-LYDPTESSMDEL-TG--FLPTDTTG-F---IA---IQAIRIKKYGESKKT--KGEELTL\n",
      "  Score=152\n",
      "\n",
      "\n",
      "P0AD96\tP32352\t0.20824742268041238\n",
      ">P0AD96\tP32352\n",
      "MNIKGKALLAGCIALAFSNMALAEDIKVA-VVGA--MSGP-VAQYGDQE-FTGAEQAVADINAKGGIKGNK--LQIVK---YDDAC--DPKQAVAVANKVV--NDGIKYV-IGHLCS-S--STQP---ASDIYED-EGILMITPAA-TAPELTARGYQLI-LRTTGL-DS--DQGPTA-AK--YIL--EKVKPQRIA-I---VHDKQQYGEGL------AR--AVQD-G----LKKGN-ANV-VF------FDGITA-G-EKDFS--TL-V--ARLKKENIDFVYYGG-Y----HPEMG-QILRQARAAGLKTQFMGPEGVANVSLSNI-AG-ESAEGLLVTKPKNYDQV-PAN------KPIVDAIK--AKKQDP-S--GA--FVWTTYAALQS-LQA-G-----LNQSDDPAEIAKY---LKAN---SV---D--T----VMGP-LTW--DEK-GDLKGFEFGVFDWHANGTATDA----K--\n",
      "|  |...||     |      |   |    |||   |    |        ||                     |       |      ||        |    |       |   |  |  |      |       || |       |  |          |    | |   |    | |   |    |         |   |  |    |        |   |    |    |     |   |       | | || | |      |  |  |       |       |    |   | ||     || |      |            |  | ||   |     |    |        |      |  | ||   |  |   |      ||   | | |     |     |         |      |    |  |    |    ||   |   |  |           |          |  \n",
      "M--KFFPLL-----L------L---I---GVVG-YIM---NV-------LFT-------------------TWL----PTNY----MFDP--------K--TLN------EI---C-NSVIS---KHNA------AEG-L------ST--E---------DL----LQD-VRD----ALA-SHY--GDE--------YINRYV--K----E--EWVFNNA-GGA---MGQMIIL----HA--SV-SEYLILF-G-TAVGTE----GHT-GVHFA-------D------DYFTILH---GTQI-----AA-L------P-----------YA-TE-AE---V-----Y---TP--GMTHHLK------KGYA-KQ--YSMPG-GSF------AL--EL-AQGWIPCML-----P------FGFL---DTFS-STLDLYTLYRTV---YLT-ARD--MG--K-----------N------LLQNKKF\n",
      "  Score=101\n",
      "\n",
      "\n",
      "P0AD96\tQ08558\t0.21839080459770116\n",
      ">P0AD96\tQ08558\n",
      "MN------IKG-----KALLAGCIA----LAFSNMA-LA--EDIK-V--AVVG-----AM----SGPVAQYGDQE-FTGAEQAVADINAKGGIKGNKLQIV---KYDDACDP-----KQAV--AVANKVV-NDGIKY-VIGHLCSSSTQPASDIYEDEGILMITPAATAP-E----LT--ARGYQLILRTTGLDSDQG-PTAAKY-ILEK-VKPQRIA----IVHDKQQYGEG--LARAVQDG--LKKGNANVVFFDGIT-A-GEKDF-ST--LVARLKKENI-DF-VYYGGYHPEMG-QI-----LRQARAAGLKTQFMGPE----G-VAN-V--S--LSNIA----GE--SA-EG--------L---LV--TKP---KNYDQVPA--N------KPIV--DAIKAKKQDPS---GAFVWTTYAALQ---S-LQAG---LNQ--SDDPAEIA-KY--L-KANSVDTV--M---GPLT---WDEK--GDL---KG-F----EFGVFDWHANGTATDA---K-\n",
      "|.      |.|     |  |   |     |   |   |   ||   |  |        |     |  |        ||     |             ||     ||           |     || ||   |||    |        |   |     |             |    |   |     |       |    |      |    |     |    | | |        |   |     |     |     |   | |     |   ||| |      |  |    |      |      |       |   |  |     | ||  |  |  |        |   || |         |   |   |     ||| |     |      |  |  | |   ||      |         |    | |  |   |    |    |   |   | ||        |   |  |   |     |     |  |    | |      |         | \n",
      "MSSRVCYHINGPFFIIK--L---I-DPKHL---N--SL-TFED--FVYIA---LLLHKA-NDIDS--V-------LFT-----V-------------LQ--SSGKY------FSSGGK---FSAV-NK--LNDG---DV--------T---S-----E------------VEKVSKL-VSA-----I-------S---SP-----NI---FV-----ANAFAI-H-K------KVL---V---CCL-----N-----G--PAIG----LS-ASLVA-L-----CD-IV----Y-----SQ-NDSVFL-------L---F--P-FSNLGFVA-EVGTSVTL----TQKLG-INSANE-HMIFSTPVLFKEL-IGT--IITKNY-Q---LTNTETFNEK--VLQD-I---KQ---NLEG---------L-YPKSVL--GMKEL--LHS----E--MK-QKLIKA------QAMETNG--TLPFW---ASG--EPFK-RFKQLQE-G------N------RRHKL\n",
      "  Score=114\n",
      "\n",
      "\n",
      "P22768\tP32352\t0.2045028142589118\n",
      ">P22768\tP32352\n",
      "MSKGKVCLAYSGG---LDTSVILAWLLDQ-GYEVVAF---MANVGQEED-F----------DAAKE-KA-LK--IGACKFVCVDCREDF--VKDILFPAVQV---NAVY-EDVYLLGT-S---LARPVIAKA-QIDV--AKQEGCF-AVSHGCTGK-GNDQ--I-RFELSFYALKPDVKCITPWRMPEFFER--F--AGRKDLLDYAAQKGIPVAQTKAKPWSTDEN--QAH-ISYEAGIL-----EDPDTTPPKDMWK-LIVDPMDAPDQPQDLTIDFERGLPVKLTYTDNK---TSKEVSV--TKPLD-V-FLAASNLARANGVGRIDIVEDRY--INLKSR-GCYE-Q-APLTVLRKAHVDLEGL---TLDKEVRQLRDSF--V-TPNYSR-----LIYN--GSYFTPECE---YIRSMIQPSQNSVNGTVRVR----LYK----GNVI-I---L--GRSTKTEK-LYDPTE-SSM--DEL-TGFLPTD--T---TGFIAIQAIRIKKY--GESKKTKGEE-LTL-----\n",
      "|   |           |     |  ||   |  ||     | ||      |          |     |  |   |         |      |  |          ||   |     |  |   |         | ||  |       | ||      | |   | |     |     ||        |  |   |  ||           |       |          |   |     ||     |             ||            |   |  |       |      |  |     |     | |        |      |   | |  | |    |    | |       |      |   |   |          | ||         |     | |         |  ||  |      |         |      |    |   |  |        | | |  ||   | | |  |     |   |      | |      |     |    | |     \n",
      "M---K--------FFPL-----L--LL--IG--VV--GYIM-NV-----LFTTWLPTNYMFD----PK-TL-NEI---------C----NSV--I-------SKHNA--AE-----G-LSTEDL--------LQ-DVRDA------LA-SH-----YG-D-EYINR-----Y-----VK--------E--E-WVFNNAG-----------G-------A--------MGQ--MI-----ILHASVSE------------YLI------------L---F--G-------T---AVGT--E---GHT----GVHF--------A------D---D-YFTI-L---HG---TQIA-------A------LPYAT---E--------AEVYTP----GMTHHL---KKG-Y------AKQY--SM--P------G-----GSFAL--ELAQG---WIPCMLPFG-------FL-D-T-FSS-TLD-LYT--L---YRTVYLT------A-R----DMG-----K---NL-LQNKKF\n",
      "  Score=109\n",
      "\n",
      "sequence align output: 100%|███████████████████| 6/6 [00:00<00:00, 22172.53it/s]\n"
     ]
    }
   ],
   "source": [
    "!python ./plmsearch/sequence_align.py \\\n",
    "-qf './example/protein.fasta' \\\n",
    "-tf './example/protein.fasta' \\\n",
    "-ipr './example/alignment/test'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "pairwise tmalign: 100%|████████████████████████| 6/6 [00:00<00:00, 15748.33it/s]\n",
      "tmalign output:   0%|                                     | 0/6 [00:00<?, ?it/s]\n",
      "P0AD96\tP0AD96\t1.0\n",
      ">P0AD96\tP0AD96\n",
      "MNIKGKALLAGCIALAFSNMALAEDIKVAVVGAMSGPVAQYGDQEFTGAEQAVADINAKGGIKGNKLQIVKYDDACDPKQAVAVANKVVNDGIKYVIGHLCSSSTQPASDIYEDEGILMITPAATAPELTARGYQLILRTTGLDSDQGPTAAKYILEKVKPQRIAIVHDKQQYGEGLARAVQDGLKKGNANVVFFDGITAGEKDFSTLVARLKKENIDFVYYGGYHPEMGQILRQARAAGLKTQFMGPEGVANVSLSNIAGESAEGLLVTKPKNYDQVPANKPIVDAIKAKKQDPSGAFVWTTYAALQSLQAGLNQSDDPAEIAKYLKANSVDTVMGPLTWDEKGDLKGFEFGVFDWHANGTATDAK\n",
      ":::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::\n",
      "MNIKGKALLAGCIALAFSNMALAEDIKVAVVGAMSGPVAQYGDQEFTGAEQAVADINAKGGIKGNKLQIVKYDDACDPKQAVAVANKVVNDGIKYVIGHLCSSSTQPASDIYEDEGILMITPAATAPELTARGYQLILRTTGLDSDQGPTAAKYILEKVKPQRIAIVHDKQQYGEGLARAVQDGLKKGNANVVFFDGITAGEKDFSTLVARLKKENIDFVYYGGYHPEMGQILRQARAAGLKTQFMGPEGVANVSLSNIAGESAEGLLVTKPKNYDQVPANKPIVDAIKAKKQDPSGAFVWTTYAALQSLQAGLNQSDDPAEIAKYLKANSVDTVMGPLTWDEKGDLKGFEFGVFDWHANGTATDAK\n",
      "\n",
      "P0AD96\tP15104\t0.24585\n",
      ">P0AD96\tP15104\n",
      "MNIKGKALLAGCIALAFSNMALAEDIKVAVVGAM--------------------------SGPVAQ-------YGDQ-----------------EFTGAEQAVADINAKGGIKGNKLQIVKYDDACDP--------KQAVAVANKVVN--DGIKYVIGHLCSSSTQPASDIYEDEGILMITPAATAPE-----LTARGYQLILRTTGLDSDQGP------TA-------------AKYILEKVKPQR-IA--I--VHD-K-QQYGEGLARAVQDGLKK-GNANVVFFDGITAGEKDFSTLVARLKKENIDFVYY--------------------GGYHPEMGQILRQARAAGLKTQFMGPEGVANVSLSNIAGESAEGLLVTKPKNYDQVPANKPIVDAIKAKKQDPSGAFVWTTYAALQSLQAGLNQSDDPAEI------------------------------------------AKYLKANSVDTVMGPLTW-----------D------------E----------------KGDLKGFEFGVFDWHANGTATDAK---------------------------\n",
      "                                                            .::...       .::.                 .....:::::  :.  .    ...::::::.::.        .... .. ..    ..:...:..     .  ...::::..                 ..:.          .::::..      ..             .:..:..::..  ..  :  ::: : :.:.::  :::. ..   .                                                      .....                    ::::::  :       :  ...:::.                                                                                            ....:......  .::..           .            .                ...::.:.                                           \n",
      "----------------------------------MTTSASSHLNKGIKQVYMSLPQGEKVQAMYIWIDGTGEGLRCKTRTLDSEPKCVEELPEWNFDGSSTLQS--EG--S----NSDMYLVPAAMFRDPFRKDPNKLVL-CE-VF--KYNRRPAETNL-----R--HTCKRIMDM------------VSNQHPWFG----------MEQEYTLMGTDGHPFGWPSNGFPGPQGPYYCGVGADRAY-GRDIVEAHYRACLYAGVKIA--GTNA-EV--MP----------------------------------AQWEFQIGPCEGISMGDHLWVARFI--------------------LHRVCE--D-------F--GVIATFD--------------------------------------------------PKPIPGNWNGAGCHTNFSTKAMREENGLKYIEEAIEKLSKRHQYHIRAYDPKG--GLDNARRLTGFHETSNINDFSAGVANRSASIRIPRTVGQEKKGYFEDRRPSANC----------------DPFSVTEALIRTCLLNETGDEPFQYKN\n",
      "\n",
      "P0AD96\tP22768\t0.27002\n",
      ">P0AD96\tP22768\n",
      "MNIKGKALLAGCIALAFSNMALAEDIKVAVVGAMSGPVAQYGDQEFTGAEQAVADINAKGGIKGNKLQIVKYDDACDPKQAVAVANKVVNDGIKY-----------------------------------------------------------------------------VIGHLCSSSTQPASDIYEDEGILMITPAATAPELTARGYQLILRTTGLDSDQGPTAAKYILEKVKPQRIAIVHDKQQYGEGLARAVQD-GLKKGNANVVFFDG--ITAG--EKDFSTLVARLKKENIDFVYYGGYHPEMGQ-ILRQARAAGLKTQF-----MGPEGVANVSLS-------------NIA-GESAEGLLVTKPKN----------------------------------------YDQVPANKPIVDAIKAKKQDPS----------------GAFVWTTYAALQSLQAGLN-QSDD-PAEIAKYLKANSVDTVMGPLT----------------WDEKGDLKGFEFGVFDWHANGTATDAK-----------------------------------------------------------------------------\n",
      "                                                                                                                                                                            ...:     :  :          ...::...               :.:::::::::::..:........:::.:: ::::::::..: ...:... :.:::.  ..    ..:::::::.... ...              ..:..              ...:...:...              ..  .                                                     ...:::...:..::..:....                 ..::::::::::::::::: :::. :::::....                            ..:::....                                                                                               \n",
      "-----------------------------------------------------------------------------------------------MSKGKVCLAYSGGLDTSVILAWLLDQGYEVVAFMANVGQEEDFDAAKEKALKIGACKFVCVDCREDFVKDILFPAVQVNAV-----Y--E----------DVYLLGTS---------------LARPVIAKAQIDVAKQEGCFAVSHGCTGKG-NDQIRFELSFYALKPDVKC-ITPWRMPEFF--ERFAGRKDLLDYAAQ-KGI-------------PVAQTK---------AKPWSTDENQAHISYE-AGILEDPDTTPPKDM-WK-------------LIVDPMDAPDQPQDLTIDFERGLPVKLTYTDNKTSKEVSVTKPLDVFLAASNLARANGVGR-IDIVEDRYINLKSRGCYEQAPLTVLRKAHVDLEGLTLDKEVRQLRDSFVT------------PNYSRLIYNGSYFTPECEYIRSMIQ------------------PSQNSVNGTVRVRLYKGNVIILGRSTKTEKLYDPTESSMDELTGFLPTDTTGFIAIQAIRIKKYGESKKTKGEELTL\n",
      "\n",
      "P0AD96\tP32352\t0.22937\n",
      ">P0AD96\tP32352\n",
      "MNIKGKALLAGCIALAFSNMALAEDIKVAVVGAMSGPVAQYG--------------------DQE-FTGAEQAVADINAKGGIKGNKLQIVKYDDACDPKQAVAVANKVVNDGIKYVIGHLC--------------------SSSTQPASDIYEDEGILMITPAATAPELTARGYQL-----------------ILRT-----TGLDSDQGPTAAKYILEKVKPQRIAIVHDK---------QQYGEGLARAVQDGLKKGNANVVFFDGITAGEKDFSTLVARLKKENIDFVYYGGYHPEMGQILRQARAAGLKTQFMGP----EGVANVSLSNIAGESAEGLLVT-------K-P-----KNYDQ-VPANKPIVDAIKAKKQDPSGAFVW-----------------TTYAALQSLQAGLNQSDDPAEIAKYLKANSVDTVMGPLTWDEKGDLKGFEFGVFDWHANGTATDAK---\n",
      "                                                              ..  .:  .                   .:...::::....                                       .....:::::::   .:::.       .:.                      .:::     .:::  :::. : :.  .                     ..::::::                                          ::      ..  :. .:   : .....     ...:::.::      :  ::..       . .     .:::. ..                                       ....::::::. :   .                                                    \n",
      "------------------------------------------MKFFPLLLLIGVVGYIMNVLFT-TWL--P-------------------TNYMFDPKTLNEI-------------------CNSVISKHNAAEGLSTEDLLQDVRDALASHYG---DEYIN-------RYV-----KEEWVFNNAGGAMGQMIILHASVSEYLILF--GTAV-G-TE--G------------HTGVHFADDYFTILHGT------------------------------------------QI------AA--LP-YA---T-EAEVY-TPGMTHHLKKGYA------K--QYSMPGGSFALELAQGWIPCMLPFGFL----------------------DTFSSTLDLYTLYRTVYLTARDMGKNLL-Q---N-------------------------------------------------KKF\n",
      "\n",
      "P0AD96\tQ08558\t0.2916\n",
      ">P0AD96\tQ08558\n",
      "MNIKGKALLAGCIALAFSNMALAEDIKVAVVGAMSGPVAQYGDQEFTGAEQAVADINAKGGIKGNKLQIVKYDDACDPKQAVAVANKVVNDGIKYVIGHLCSSSTQPASDIYEDEGILMITP--------------A--AT--APELTARGYQLILRTTGLDSDQGPTAAKYILEKVKPQRIAIVHDKQQY----------GEGLARAVQDGLKKGNANVVFFDGITAGEKDFSTLVARLKKENID-FV---YYGGYHPEMGQILRQARAAGLKTQFM-----------------------GP-EGVA--------------NVSLSNIAGESAEGLLVTK-PK-N-YDQVPANKPI--VDAIKA-----KKQDPSGAFVWTTYAALQS----------LQA-GLNQSDDPAEIAKYLKANSVDTVMGPLTWDEKGDLKGFE--F-GV---F--DW--H-----AN-------------GTATDAK--\n",
      "                                                                            ....                      ::  :: : ::  ......               .  ..  :::::::::::::..                                           .                       ...:.:::::..:..:.:.:. .:   :::..                                            .. :...              .::::: :  :: :..::: :: : :::.. ..    .          ..:.... ..:.::...            ..  ... .:.:.::.:..:..:        ::.   .:::::  : .:   :  ..  .     .              .        \n",
      "----------------------------------------------------------------------------MSSR----------------------VC--YH-I-NG--PFFIIK-LIDPKHLNSLTFEDFVYIALLLHKANDIDSVLFTVL---------------------------------QSSGKYFSSGG-----------------------KFSAVNKLNDGDVTSEVEKVSKLVSAISSPNI---------------------FVANAFAIHKKVLVCCLNGPAIGLSASLVALCDIVYSQNDSVFLLFPFSN-L--GF-VAEVGTSVTLTQKLGIN-SA--NEH-----MIFSTPVLFKEL-IGTIITKNY--QLTNTETFNEKV-LQDI-KQNLEGLYPKSVLGM--------KEL---LHSEMKQKLIKAQAMETNGTLPFWASGEP-FKRFKQLQEGNRRH------KL\n",
      "\n",
      "P22768\tP32352\t0.24053\n",
      ">P22768\tP32352\n",
      "MSKG--------------------------KVCLAYSGGLDTSVILAWLLDQGYEVVAFMANVGQEEDFD-AAKEKALKIGACK----------------------FVCVD-CREDFVKDILFPAVQVNAVYEDVYLLGTSLARPVIAKAQIDVAKQEGCFAVSHGCTG--------------KGNDQIRFELSFYALKPDVKCITPWRMPEFFERFAGRKDLLDYAAQKGIPVAQTKAKPWSTDENQAHISYEAGILEDPDTTPPKDMWKLIVDPMDAPDQPQDLTIDFERGLPVKLTYTDNKTSKEVSVTKPLDVFLAASNLARANGVGRIDIVEDRY-------INLKSRGCYEQAPLTVLRKAHVDLEGLTLDKEVRQL----------------------RDSFVTPNYSRLIYNGSYFTPECEYIRSMIQPSQNSVNGTVRVRLYKGNVIILGRSTKTEKLYDPTESSMDELTGFLPTDTTGFIAIQAIRIKKYGESKKTKGEELTL------------\n",
      "                              ...:::::.                    ...  .:::.  .                                  .::.. ...::.:::: ::    :::::::: .  :: ::::.  .          ....                 .:.:.                                                      ..::.....:..   ::::::::.  :::.                                                                           . ..::::::.::..                                           .::::. .:. ..  ..:::.                                                                                                   \n",
      "----MKFFPLLLLIGVVGYIMNVLFTTWLPTNYMFDPKT--------------------LNE--ICNSV-IS------------KHNAAEGLSTEDLLQDVRDALASHYGDEYINRYVKEEW-VF----NNAGGAMG-Q--MI-ILHAS--V----------SEYL---ILFGTAVGTEGHTGVHFAD------------------------------------------------------DYFTILHGTQIA---ALPYATEAE--VYTP--------------------------------------------------------------------GMTHHLKK-GYAKQYSMPGGSF---------------------ALELAQGWIPCMLPFGFLDTFSSTLDLY-TLY-RT--VYLTAR---------------------------------------------------------------------------------------DMGKNLLQNKKF\n",
      "tmalign output: 100%|██████████████████████████| 6/6 [00:00<00:00, 21769.74it/s]\n"
     ]
    }
   ],
   "source": [
    "!python ./plmsearch/tmalign.py \\\n",
    "-qsd './example/structure/' \\\n",
    "-tsd './example/structure/' \\\n",
    "-ipr './example/alignment/test'"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "plmsearch",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.20"
  },
  "orig_nbformat": 4
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
